45 research outputs found

    Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples

    Full text link
    Embeddings mapping high-dimensional discrete input to lower-dimensional continuous vector spaces have been widely adopted in machine learning applications as a way to capture domain semantics. Interviewing 13 embedding users across disciplines, we find comparing embeddings is a key task for deployment or downstream analysis but unfolds in a tedious fashion that poorly supports systematic exploration. In response, we present the Embedding Comparator, an interactive system that presents a global comparison of embedding spaces alongside fine-grained inspection of local neighborhoods. It systematically surfaces points of comparison by computing the similarity of the kk-nearest neighbors of every embedded object between a pair of spaces. Through case studies, we demonstrate our system rapidly reveals insights, such as semantic changes following fine-tuning, language changes over time, and differences between seemingly similar models. In evaluations with 15 participants, we find our system accelerates comparisons by shifting from laborious manual specification to browsing and manipulating visualizations.Comment: Equal contribution by first two author

    VisText: A Benchmark for Semantically Rich Chart Captioning

    Full text link
    Captions that describe or explain charts help improve recall and comprehension of the depicted data and provide a more accessible medium for people with visual disabilities. However, current approaches for automatically generating such captions struggle to articulate the perceptual or cognitive features that are the hallmark of charts (e.g., complex trends and patterns). In response, we introduce VisText: a dataset of 12,441 pairs of charts and captions that describe the charts' construction, report key statistics, and identify perceptual and cognitive phenomena. In VisText, a chart is available as three representations: a rasterized image, a backing data table, and a scene graph -- a hierarchical representation of a chart's visual elements akin to a web page's Document Object Model (DOM). To evaluate the impact of VisText, we fine-tune state-of-the-art language models on our chart captioning task and apply prefix-tuning to produce captions that vary the semantic content they convey. Our models generate coherent, semantically rich captions and perform on par with state-of-the-art chart captioning models across machine translation and text generation metrics. Through qualitative analysis, we identify six broad categories of errors that our models make that can inform future work.Comment: Published at ACL 2023, 29 pages, 10 figure

    Assessing the Impact of Automated Suggestions on Decision Making: Domain Experts Mediate Model Errors but Take Less Initiative

    Full text link
    Automated decision support can accelerate tedious tasks as users can focus their attention where it is needed most. However, a key concern is whether users overly trust or cede agency to automation. In this paper, we investigate the effects of introducing automation to annotating clinical texts--a multi-step, error-prone task of identifying clinical concepts (e.g., procedures) in medical notes, and mapping them to labels in a large ontology. We consider two forms of decision aid: recommending which labels to map concepts to, and pre-populating annotation suggestions. Through laboratory studies, we find that 18 clinicians generally build intuition of when to rely on automation and when to exercise their own judgement. However, when presented with fully pre-populated suggestions, these expert users exhibit less agency: accepting improper mentions, and taking less initiative in creating additional annotations. Our findings inform how systems and algorithms should be designed to mitigate the observed issues.Comment: Fixed minor formattin

    Striking a Balance: Reader Takeaways and Preferences when Integrating Text and Charts

    Full text link
    While visualizations are an effective way to represent insights about information, they rarely stand alone. When designing a visualization, text is often added to provide additional context and guidance for the reader. However, there is little experimental evidence to guide designers as to what is the right amount of text to show within a chart, what its qualitative properties should be, and where it should be placed. Prior work also shows variation in personal preferences for charts versus textual representations. In this paper, we explore several research questions about the relative value of textual components of visualizations. 302 participants ranked univariate line charts containing varying amounts of text, ranging from no text (except for the axes) to a written paragraph with no visuals. Participants also described what information they could take away from line charts containing text with varying semantic content. We find that heavily annotated charts were not penalized. In fact, participants preferred the charts with the largest number of textual annotations over charts with fewer annotations or text alone. We also find effects of semantic content. For instance, the text that describes statistical or relational components of a chart leads to more takeaways referring to statistics or relational comparisons than text describing elemental or encoded components. Finally, we find different effects for the semantic levels based on the placement of the text on the chart; some kinds of information are best placed in the title, while others should be placed closer to the data. We compile these results into four chart design guidelines and discuss future implications for the combination of text and charts.Comment: 11 pages, 4 tables, 6 figures, accepted to IEEE Transaction on Visualization and Graphic

    Bluefish: A Relational Framework for Graphic Representations

    Full text link
    Complex graphic representations -- such as annotated visualizations, molecular structure diagrams, or Euclidean geometry -- convey information through overlapping perceptual relations. To author such representations, users are forced to use rigid, purpose-built tools with limited flexibility and expressiveness. User interface (UI) frameworks provide only limited relief as their tree-based models are a poor fit for expressing overlaps. We present Bluefish, a diagramming framework that extends UI architectures to support overlapping perceptual relations. Bluefish graphics are instantiated as relational scenegraphs: hierarchical data structures augmented with adjacency relations. Authors specify these relations with scoped references to components found elsewhere in the scenegraph. For layout, Bluefish lazily materializes necessary coordinate transformations. We demonstrate that Bluefish enables authoring graphic representations across a diverse range of domains while preserving the compositional and abstractional affordances of traditional UI frameworks. Moreover, we show how relational scenegraphs capture previously latent semantics that can later be retargeted (e.g., for screen reader accessibility).Comment: 27 pages, 14 figure

    Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs

    Full text link
    Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two visual analytics modules that facilitate an intuitive assessment of model reliability. To help users better characterize and reason about a model's uncertainty, we visualize raw and aggregate information about a given input's nearest neighbors. Using an interactive editor, users can manipulate this input in semantically-meaningful ways, determine the effect on the output, and compare against their prior expectations. We evaluate our interface using an electrocardiogram beat classification case study. Compared to a baseline feature importance interface, we find that 14 physicians are better able to align the model's uncertainty with domain-relevant factors and build intuition about its capabilities and limitations
    corecore